289 research outputs found
Effects of Lombard Reflex on the Performance of Deep-Learning-Based Audio-Visual Speech Enhancement Systems
Humans tend to change their way of speaking when they are immersed in a noisy
environment, a reflex known as Lombard effect. Current speech enhancement
systems based on deep learning do not usually take into account this change in
the speaking style, because they are trained with neutral (non-Lombard) speech
utterances recorded under quiet conditions to which noise is artificially
added. In this paper, we investigate the effects that the Lombard reflex has on
the performance of audio-visual speech enhancement systems based on deep
learning. The results show that a gap in the performance of as much as
approximately 5 dB between the systems trained on neutral speech and the ones
trained on Lombard speech exists. This indicates the benefit of taking into
account the mismatch between neutral and Lombard speech in the design of
audio-visual speech enhancement systems
On Comparison of Adaptive Regularization Methods
This paper investigates recently suggested adaptive regularization schemes
O USO DE FEEDBACK DO SUPERVISOR E FEEDBACK AFIXADO PUBLICAMENTE PARA AUMENTAR A SEGURANÇA EM UM AMBIENTE DE FÁBRICA
The effects of safety-related and behaviorally relevant verbal and posted feedback from supervisors in a manufacturing plant were evaluated using a multiple baseline across behaviors design. During baseline, plant safety averaged 35.3% for the behaviors and conditions on Checklist 1, and 35.0% for the behaviors and conditions on Checklist 2. When verbal supervisory feedback was implemented, the plant safety average increased to 50.6% for Checklist 1, and 75.7% for Checklist 2. When posted supervisory feedback was added to the intervention package, the plant safety average further increased to 58.0% for Checklist 1, and 83.3% for Checklist 2. These results are consistent with previous findings that performance feedback can increase critical work behaviors.Key words: health, injury prevention, behavior-based safety, feedback, accidents.O estudo empregou um delineamento de linha de base entre comportamentos para avaliar os efeitos de feedback do supervisor sobre comportamentos relevantes, relacionados com segurança, em uma fábrica de manufaturas. Durante a linha de base, a segurança do depósito atingiu uma média de 35,3% para os comportamentos e condições do Checklist 1, e 35,0% para comportamentos e condições do Checklist 2. Quando o feedback verbal do supervisor foi implementado, a média da segurança do depósito aumentou para 50,6% para o Checklist 1 e para 75,7% para o Checklist 2. Quando o feedback do supervisor passou a ser afixado como parte do programa de intervenção, a média da segurança do trabalho subiu ainda mais, chegando a 58,0% para a Checklist 1 e 83,3% para a Checklist 2. Esses resultados são consistentes com descobertas anteriores de que o feedback sobre o desempenho pode aumentar comportamentos críticos na situação de trabalho. Palavras-chave: saúde, prevenção de danos físicos, segurança baseada no comportamento, feedback, acidentes
Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music
In large MP3 databases, files are typically generated with different parameter settings, i.e., bit rate and sampling rates. This is of concern for MIR applications, as encoding difference can potentially confound meta-data estimation and similarity evaluation. In this paper we will discuss the influence of MP3 coding for the Mel frequency cepstral coeficients (MFCCs). The main result is that the widely used subset of the MFCCs is robust at bit rates equal or higher than 128 kbits/s, for the implementations we have investigated. However, for lower bit rates, e.g., 64 kbits/s, the implementation of the Mel filter bank becomes an issue
On Training Targets and Objective Functions for Deep-Learning-Based Audio-Visual Speech Enhancement
Audio-visual speech enhancement (AV-SE) is the task of improving speech
quality and intelligibility in a noisy environment using audio and visual
information from a talker. Recently, deep learning techniques have been adopted
to solve the AV-SE task in a supervised manner. In this context, the choice of
the target, i.e. the quantity to be estimated, and the objective function,
which quantifies the quality of this estimate, to be used for training is
critical for the performance. This work is the first that presents an
experimental study of a range of different targets and objective functions used
to train a deep-learning-based AV-SE system. The results show that the
approaches that directly estimate a mask perform the best overall in terms of
estimated speech quality and intelligibility, although the model that directly
estimates the log magnitude spectrum performs as good in terms of estimated
speech quality
Pruning the vocabulary for better context recognition
Language independent `bag-of-words' representations are surprisingly effective for text classification. The representation is high dimensional though, containing many nonconsistent words for text categorization. These non-consistent words result in reduced generalization performance of subsequent classifiers, e.g., from ill-posed principal component transformations. In this communication our aim is to study the effect of reducing the least relevant words from the bagof -words representation. We consider a new approach, using neural network based sensitivity maps and information gain for determination of term relevancy, when pruning the vocabularies. With reduced vocabularies documents are classified using a latent semantic indexing representation and a probabilistic neural network classifier. Reducing the bag-of-words vocabularies with 90%-98%, we find consistent classification improvement using two mid size data-sets. We also study the applicability of information gain and sensitivity maps for automated keyword generation
- …